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Abstract. A new global estimate of surface turbulent fluxes, latent heat flux (LE) and sensible heat flux (H), and gross primary 
production (GPP) is developed using a machine learning approach informed by novel remotely sensed Solar-Induced Fluorescence 
(SIF) and other radiative and meteorological variables. This is the first study to jointly retrieve LE, H and GPP using SIF 
observations. The approach uses an artificial neural network (ANN) with a target dataset generated from three independent data 
sources, weighted based on triple collocation (TC) algorithm. The new retrieval, named Water, Energy, and Carbon with Artificial 
Neural Networks (WECANN), provides estimates of LE, H and GPP from 2007 to 2015 at 1° x 1° spatial resolution and on 
monthly time resolution. The quality of ANN training is assessed using the target data, and the WECANN retrievals are evaluated 
using eddy covariance tower estimates from FLUXNET network across various climates and conditions. When compared to eddy 
covariance estimates, WECANN typically outperforms other products, particularly for sensible and latent heat fluxes. Analysing 
WECANN retrievals across three extreme drought and heatwave events demonstrates the capability of the retrievals in capturing 
the extent of these events. Uncertainty estimates of the retrievals are analysed and the inter-annual variability in average global 


and regional fluxes show the impact of distinct climatic events — such as the 2015 El Nifio - on surface turbulent fluxes and GPP. 


1 Introduction 


Turbulent fluxes from the land surface to the atmosphere, particularly sensible heat flux (H) and latent heat flux (LE), and plants 
carbon uptake characterized by gross primary production (GPP) are key to understanding ecosystem response to climate and the 
feedback on the overlying atmosphere, as well as constraining the global carbon, water and energy cycles. In recent years, there 
has been substantial effort towards estimating these variables from remote sensing observations at a global scale (see e.g. Fisher 
et al., 2008; Jiang and Ryu, 2016; Jiménez et al., 2009, 2011; Jung et al., 2009; Miralles et al., 201 1a; Mu et al., 2007; Mueller et 
al., 2011). Two typical approaches have been used to estimate these from remote sensing information. The first approach uses 
physically-based or semi-empirical models (e.g. the Priestley-Taylor or Penman-Monteith equations in the case of ET, or a light 
use efficiency model in the case of GPP) informed by remote sensing information (e.g. vegetation indices, infrared temperature, 
microwave soil moisture), often in combination with reanalysis meteorological forcing data (Fisher et al., 2008; Miralles et al., 


201 1a; Mu et al., 2007; Zhang et al., 2016b; Zhao et al., 2005; Zhao and Running, 2010). These approaches are sensitive to the 
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assumptions and imperfections of the underlying flux models. The second approach, uses machine learning (e.g. a model tree 
ensemble) to determine LE, H, and GPP from meteorological drivers and optical remote sensing data (Tramontana et al., 2016). 
Like all supervised machine learning models, this approach relies on a training dataset to determine the non-linear statistical 
relationships. In this case, in situ turbulent flux and GPP estimates from eddy-covariance towers are used (Beer et al., 2010; Jung 
et al., 2011). Such an approach relies implicitly on an assumption that a long temporal record of these variables at a small number 
of sites captures the full range of behavior and sensitivities of terrestrial ecosystems around the globe. In addition, extreme and 
therefore rare events may be difficult to capture based on the limited data availability. 

Alternatively, one can use a machine learning approach, such as an Artificial Neural Network (ANN), trained on globally- 
representative but imperfect estimates of the fluxes (such as those from models) to parameterize the non-linear statistical 
relationships between remote sensing observations and surface fluxes. This approach has been successfully used for global soil 
moisture retrieval (Aires et al., 2012; Kolassa et al., 2013, 2016; Rodriguez-Fernandez et al., 2015) and surface heat flux retrieval 
(Jiménez et al., 2009). Such ANNs require a target dataset for training. Climate model simulations of the relevant geophysical 
variable are usually used as the training dataset to facilitate subsequent data assimilation efforts (Aires et al., 2012; Kolassa et al., 
2013, 2016). However, the downside of this approach is that the resulting fluxes estimated by the ANN often exhibit some of the 
same biases as the simulations used to train the network (Rodriguez-Fernandez et al., 2015), even if improvements can be achieved 
such as a more realistic seasonal cycle as it is informed by the seasonal cycle of the remote sensing data (Jiménez et al., 2009). 
Previous studies show a strong relationship between the rate of photosynthesis and SIF observations and indicate that the plant 
fluorescence measurements can be a useful proxy for photosynthesis estimation (Flexas et al., 2002; Govindjee et al., 1981; Havaux 
and Lannoye, 1983; van Kooten and Snel, 1990; Krause and Weis, 1991; McFarlane et al., 1980; Toivonen and Vidaver, 1988; 
van der Tol et al., 2009). Recently, satellite observations of SIF have become available, opening new possibilities for the global 
monitoring of photosynthesis (Frankenberg et al., 2011, 2012, 2014; Guanter et al., 2012; Joiner et al., 2013; Schimel et al., 2015; 
Xu et al., 2015). 

SIF observations from the Global Ozone Monitoring Experiment—2 (GOME-2) instrument are shown to better track the seasonal 
cycle of GPP compared to typical high-resolution optically-based vegetation index estimates (Guanter et al., 2012, 2014; Joiner et 
al., 2014; Walther et al., 2016). SIF has also been shown to be a pertinent indicator of vegetation water stress (Lee et al., 2013). 
Moreover, a near-linear relationship between monthly SIF retrievals and GPP is found for different vegetation types which suggests 
that SIF estimates can strongly constrain GPP retrievals (Frankenberg et al., 2011). 

Recently, a new SIF product was developed from observations of the GOME-2 satellite using a new retrieval algorithm that 
disentangles three components from multispectral observations (Joiner et al., 2013). SIF retrievals are shown not to be strongly 
affected by cloud contamination and seasonal variabilities in aerosol optical depth (Frankenberg et al., 2014). More recently, 
remotely sensed SIF retrievals have been used to successfully provide estimates of GPP in cropland and grassland ecosystems 
(Guanter et al., 2014; Zhang et al., 2016a). SIF retrievals are also integrated with photosynthesis estimates from National Center 
for Atmospheric Research Community Land Model version 4 (NCAR CLM4) which result in significant improvement of the 
photosynthesis simulation (Lee et al., 2015). As GPP relates to plant transpiration through stomata regulation (Damour et al., 2010; 
DeLucia and Heckathorn, 1989; Dewar, 2002), and transpiration water fluxes dominate continental ET (Jasechko et al., 2013), the 
use of remotely sensed SIF has the potential to also better constrain estimates of the continental water (LE), and energy (H) cycles, 
in addition to carbon (GPP) cycle. 

In this study, we develop an ANN approach to retrieve monthly estimates of LE, H, and GPP at the global scale. The network uses 


remotely sensed solar-induced fluorescence (SIF) estimates in addition to other data including precipitation, temperature, soil 
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moisture, snow cover, and net radiation as inputs (predictor). To our knowledge, this is the first study that uses remotely-sensed SIF 
estimates at the global scale to retrieve LE and H surface turbulent fluxes along with GPP. 

Moreover, to reduce any errors, we introduce a Bayesian perspective to generate the target dataset for the ANN. Multiple estimates 
of LE, H and GPP are selected according to an a priori probability that reflects the quality and information content of the dataset at 
the particular pixel of interest (details are provided in Section 3.2). This approach enables us to generate a robust target dataset for 
remote sensing observations along with a statistical algorithm for the retrieval, while bypassing the need for a land surface model 
and radiative transfer scheme. We use the triplet of GLEAM, ECMWF and FLUXNET-MTE for training of LE and H; and the 
triplet of MODIS-GPP, ECMWE and FLUXNET-MTE for GPP training. 

This new global product is named WECANN (Water, Energy, and Carbon Cycle with Artificial Neural Networks). WECANN 
monthly estimates for the period 2007 — 2015 are provided on a 1° x 1° resolution grid and with units of W m? for LE and H, and 
gC m”? day"! for GPP. The spatial coverage of WECANN is presented in Figure $1. It includes all the land areas, except for 
Greenland, Antarctica, and any 1° x 1° pixel that has more than 75% water, snow or ice permanently. To estimate the fraction of 
water, snow and ice in each pixel we used the 0.05° x 0.05° MODIS-based Land Cover Type product (MCD12C1 v051) (NASA 
LP DAAC, 2016). 


2 Data 


The inputs of WECANN include six remotely sensed variables introduced in Section 2.2 and Table 2: SIF, net radiation, air 
temperature, soil moisture, precipitation, and snow water equivalent. These are used to retrieve LE, H, and GPP. Different 
observation and/or model based datasets are used as the training dataset, and are explained in Section 2.1 and summarized in Table 
1. All the data presented here are projected and gridded on a 1° x 1° geographic grid and averaged at monthly temporal resolution. 


Finally, independent datasets used for evaluation of the ANN retrievals are presented in Section 2.3. 


2.1 Training Datasets 


Four products are introduced in this section, and a triplet of them is used for training of each of the LE, H, and GPP (Section 3.2). 
For LE and H, training is performed based on GLEAM, FLUXNET-MTE, and ECWMF ERA HTESSEL. For GPP, training is 
performed on FLUXNET-MTE, ECWMEF ERA HTESSEL, and MODIS-GPP. Table | summarizes the characteristics of the 


training datasets used here. 


2.1.1 GLEAM 


The Global Land Evaporation Amsterdam Model (GLEAM) is a set of algorithms to estimate terrestrial evapotranspiration using 
satellite observations (Martens et al., 2016; Miralles et al., 201 1a). GLEAM is a physically-based model composed of 1) a rainfall 
interception scheme, driven by rainfall and vegetation cover observations; 2) a potential evaporation scheme, calculated from the 
Priestley and Taylor (1972) equation and driven by satellite observations; and 3) a stress factor attenuating potential evaporation, 
based on a semi-empirical relationship between microwave VOD observations and root-zone soil moisture estimates (based on a 
running water balance for rainfall and assimilating satellite soil moisture). The data is provided on a 0.25° x 0.25° spatial resolution 
and daily temporal resolution and starts in 1980. GLEAM data have been used for studying land-atmosphere interactions, and the 
global water cycle (Guillod et al., 2014, 2015, Miralles et al., 201 1a, 2014a, 2014b). In this study, we use LE and H estimates from 


the latest version v3.0a (Martens et al., 2016). 
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2.1.2 FLUXNET-MTE 


The FLUXNET-MTE (Multi-Tree Ensemble) provides global surface fluxes at 0.5° x 0.5° spatial resolution derived from empirical 
upscaling of eddy-covariance measurements from the FLUXNET global network (Baldocchi et al., 2001). The MTE method used 
is an ensemble learning algorithm that enables learning diverse sequence of different model trees by perturbing the base learning 
algorithm (Jung et al., 2009, 2010, 2011). The data covers the period from January 1982 to December 2012 and can be used for 
benchmarking land surface models and assessment of biosphere gas exchange. We use LE, H, and GPP estimates from FLU XNET- 
MTE. 


2.1.3 ECMWE ERA HTESSEL 


The European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA) is a global 3D variational data 
assimilation (3DVAR) product that uses the Hydrology Tiled ECMWE Scheme for Surface Exchanges over Land (HTESSEL) in 
the forecast system. HTESSEL has a surface runoff component and accounts for a global non-uniform soil texture unlike the old 
TESSEL model (Balsamo et al., 2009). This is an offline model simulation, and HTESSEL is driven by meteorological forcing 
output from the forecast runs. Photosynthesis in the model is computed independently (i.e. with its own canopy conductance) from 
LE, so that the carbon cycle does not interact with the water cycle at the stomata level, adding errors. We use LE, H, and GPP 


estimates from ERA HTESSEL provided on a 0.25° x 0.25° geographic grid with daily temporal resolution. 


2.1.4 MODIS-GPP 


The Moderate Resolution Imaging Spectroradiometer (MODIS) sensor is onboard the sun-synchronous NASA satellites Terra 
(10:30 AM/PM overpasses) and Aqua (1:30 AM/PM overpasses). It provides 44 global data products (Justice et al., 2002) from 36 
spectral bands including visible, infrared and thermal infrared spectrums to monitor and understand Earth surface: atmosphere, 
land and ocean processes. The MODIS GPP/NPP project (MOD17) provides gross/net primary production estimates covering the 
whole land surface and is useful for analyzing the global carbon cycle and monitoring environmental change. The MOD17 
algorithm is based on a light-use efficiency approach proposed by (Monteith and Moss, 1977), which states that GPP is proportional 
to the product of incoming Photosynthetically Active Radiation (PAR), fraction of Absorbed PAR (fAPAR) and efficiency of 
radiation absorption in photosynthesis. We use the monthly MOD17A2 GPP product (Running et al., 2004; Zhao et al., 2005; Zhao 
and Running, 2010). MOD17A2 is available from 2000 until 2015, and provided on a 0.05° x 0.05° spatial resolution. 


2.2 Input Datasets 


Six sets of observations are used as input to the WECANN retrieval algorithm. These are selected in a way to provide necessary 
physical constraints on the estimates from the ANN. Table 2 lists the characteristics of each of the datasets, and they are briefly 


introduced in the following. 


2.2.1 Solar-Induced Fluorescence 


The GOME-2 instrument is an optical spectrometer onboard Meteorological Operational Satellite Program, MetOp-A and MetOp- 
B satellites, which were launched by the European Space Agency (ESA). GOME-2 was designed to monitor atmospheric ozone 
profile as wells as other trace gases and water vapor content. It senses Earth backscatter radiance and solar irradiance at a 40x40 
km spatial resolution (prior to July 2013 the spatial resolution was 40x80 km). Recently, the retrieval of Solar-Induced chlorophyll 
Fluorescence (SIF) using GOME-? observations in the 650-800 nm spectrum has been investigated (Joiner et al., 2013, 2016). We 
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use version 26 of the daily SIF product that uses the MetOp-A GOME-2 channel 4 with a ~0.5 nm spectral resolution and 


wavelengths between 734 and 758 nm. SIF estimates are provided on a geographic grid with 0.5° x 0.5° grid spacing. 


2.2.2 Net Radiation 


Net radiation is the main control of the rates of sensible and latent heat in wet environments and is closely related to PAR. The 
Clouds and Earth’s Radiation Energy System (CERES) is a suite of instruments which measure radiometric properties of solar 
reflected and Earth emitted radiation from the Top Of Atmosphere (TOA) to Earth surface, from three broadband channels at 0.3 
— 100 um. The CERES sensors are on board the Earth Observation Satellites (EOS) including Terra, Aqua and TRMM (Kato et 
al., 2013; Loeb et al., 2009). We use the net radiation estimates from Synoptic Radiative Fluxes and Clouds (SYN) product of 


CERES which are provided on a 1° x 1° geographic grid with monthly time resolution. 


2.2.3 Air Temperature 


The Atmospheric Infrared Sounder (AIRS) is a high spectral resolution spectrometer onboard the NASA Aqua satellite launched 
in 2002. It provides hyperspectral (visible and thermal infrared) observations for monitoring process changes in the Earth’s 
atmosphere and land surface, as well as for improving weather prediction. The AIRS instrument was designed to obtain 
atmospheric temperature and humidity profiles of every 1 km layer of the atmosphere. The accuracy of AIRS temperature 
observations is typically better than 1°C in the lower troposphere under clear sky condition (Aumann et al., 2003). We use daily 


temperature estimates from the lowest layer of AIRS level-3 standard product that is provided on a 0.5° x 0.5° geographic grid. 


2.2.4 Surface Soil Moisture 


The European Space Agency (ESA) Climate Change Initiative (CCI) program soil moisture (ESA CCI SM) is a multi-decadal 
(1980-2015) global satellite-observed surface soil moisture product. It merges observations from passive sensors (e.g., Scanning 
Multichannel Microwave Radiometer (SMMR), Special Sensor Microwave/Imager (SSM/I), AMSR-E) and active ones (e.g., the 
European Remote Sensing (ERS), Advanced Scatterometer (ASCAT)), based on a triple collocation error characterization (Dorigo, 
et al., in reivew; Liu, Parinussa, et al., 2011; Liu et al., 2012; Wagner et al., 2012). Here, we use daily data from the latest version, 


v2.3. ESA CCI SM is provided on a 0.25° x 0.25° geographic grid. 


2.2.5 Precipitation 


The Global Precipitation Climatology Project (GPCP) provides global daily precipitation estimates at 1° x 1° spatial resolution 
from Oct. 1996 to near present (Huffman et al., 2001). Global precipitation estimates from infrared and microwave instruments 
are combined with monthly gauge measurements to produce the daily estimates. In this study, v1.2 of the one-Degree Daily (1DD) 
product of GPCP is used and daily estimates are aggregated to monthly scales. Several studies have evaluated the GPCP 1DD 
product at global or regional scales, and results show that it has high accuracy and good agreement with independent in situ 
measurements and other global precipitation estimates (Gebremichael et al., 2005; Joshi et al., 2012; McPhee et al., 2005; Rubel 
et al., 2002). 


2.2.6 Snow Water Equivalent 


The GlobSnow project is developed by ESA, and provides long-term snow-related variables: Snow Water Equivalent (SWE) and 
areal Snow Extent (SE). It combines microwave-based retrievals of snow information (including Nimbus-7 SMMR, DMSP 


F8/F11/F13/F17 SSM/I(S) observations) and ground based station data through a data assimilation process and provides the SWE 
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and SE products at different temporal resolutions: daily, weekly and monthly (Pulliainen, 2006). Here, we use v2 of the daily L3A 
SWE product which is posted on a 25 km x 25 km EASE grid. 


2.3 Evaluation Datasets 
2.3.1 Eddy-Covariance Tower Estimates 


FLUXNET is a network of regional tower sites, which measure turbulent flux exchanges (water vapor, energy fluxes and carbon 
dioxide) between ecosystems and atmosphere (Baldocchi et al., 2001). FLUXNET comprises over 750 sites covering five 
continents. Measurements from the FLUXNET towers provide valuable information for evaluating satellite based retrievals of 
surface fluxes. In this study, FLUXNET measurements from the FLUXNET 2015, the La Thuile Synthesis dataset and the Large- 
scale Biosphere-Atmosphere (LBA) experiment in Brazil are used for evaluation (details are provided in section 4.4). 


FLUXNET 2015 tier 1 and tier 2 data were retrieved from (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/). The data have 


been systematically quality controlled with a standard format throughout the dataset (http://fluxnet.fluxdata.org/data/fluxnet2015- 


dataset/data-processing/, (Pastorello et al., 2014)) and gap-filled using ERA meteorological forcing downscaling. 


From the Large-scale Biosphere-Atmosphere (LBA) experiment in Brazil, we use data from sites in Rond6nia at the edge of a 
deforested region (BR-Jil and BR-Ji2) and near SAo-Paulo (BR-Sp1). As the data did not span recent years, we instead use a 
climatology of the fluxes for comparison from 1999 to 2003. We note that, of course, the inter-annual variability in the region 
(such as El Nifio and La Nifia) could alter the seasonality and magnitude of the fluxes in the region. 


We also use data from the La Thuile Synthesis Dataset (http://fluxnet.fluxdata.org/data/la-thuile-dataset/) covering 24 sites. These 


data are part of the free-fair use version of the dataset. 

A total of 85 sites from the three datasets are selected for evaluation of WECANN retrievals spanning a large climatic and biome 
gradient (Fig. S2). For AmeriFlux towers, if measurements from both the FLUXNET 2015 dataset and the La Thuile dataset were 
available, we have used the FLUXNET 2015 data. We have only selected sites that had at least 24 month of continuous 
measurements during 2007-2015 years. Any site that would have fallen outside of the WECANN land mask (Fig. S1) is excluded 


(several sites in coastal regions). 


2.3.2 Basin Scale ET 


We use estimates of an independent water budget closure model across five major basins to evaluate WECANN retrievals at 
regional scales (Aires, 2014; Munier et al., 2014) . ET estimates from the budget closure approach satisfy a water budget closure 
with no residual; therefore, they can be used as a reference to evaluate WECANN ET estimates at basin scale. These basins include 
Amazon (4,680,000 km), Colorado (618,715 km’), Congo (3,475,000 km7), Mississippi (2,964,255 km?) and Orinoco (836,000 
km’). Details of the water budget estimate are provided in Munier and Aires, 2017, but in a nutshell they combine estimates of 
precipitation, evaporation, water storage and runoff to define a best estimate of the different fluxes and changes in storage, 


constrained by the water budget over the basin. Their analysis is carried out from 2002 through 2010. 


3. Methodology 
3.1 Artificial Neural Network Setup 


We developed an ANN retrieval algorithm to estimate the surface fluxes (LE and H) and GPP based on our six sets of input 
observations: SIF, net radiation, air temperature, soil moisture, precipitation, and SWE (as described in Section 2.2). The ANN 


used here is a feedforward network consisting of three layers: (1) an input layer that directly connects to the input data, (2) one 
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hidden layer and (3) an output layer that produces the 3 output estimates. The number of neurons in the input and output layer is 
determined by the number of input and output variables, whereas for the hidden layer it has to be chosen according to the 
complexity of the problem (see below). The neuron output from each layer is fed to neurons in the subsequent layer through 
weighted connections. Each neuron output is the weighted sum of its inputs plus a bias, which is then subjected to a transfer 
function. In this study, we chose a tangent sigmoid transfer function for neurons in the hidden layer and a linear transfer function 
in the output layer. The change of the transfer function for the hidden layer (log sigmoid or tangent sigmoid) did not produce any 
significant changes in the retrievals (not shown), so we used the more common method. A schematic of the ANN architecture is 
provided in Fig. 1. 

The training step of the ANN aims at estimating the weights for each of the neuron connections, such that the mismatch between 
the ANN outputs and target estimates is minimized. For this, we used the mean squared error (MSE) as the cost function and a 
backpropagation algorithm to adjust the ANN weights. During the training, the network implicitly learns the coupling of the LE, 
H and GPP by using one set of neurons (with their respective weights and biases) to estimate the three variables. This is an 
advantage of using a machine learning technique that eliminates the need to define physical relationships between different 
variables. 

For the purpose of training, the target data is divided into three subsets: training, validation and testing constituting 60%, 20% and 
20% of the target data, respectively. In each iteration, the training subset is used to estimates the weights in the network, and the 
convergence of the training towards the target data is checked using the validation subset. When overfitting of the network weights 
to the training data occurs, the validation estimates start diverging from the target data and the training is stopped (early stopping). 
The weights from the last iteration before the occurrence of the divergence represent the final solution. The testing subset are used 
to assess the ANN performance after the training phase. 

As an additional measure to avoid overfitting, we repeated the training for several ANNs with an increasing number of neurons in 
the hidden layer (1 to 15). For 1 to 5 neurons, the R* value between the target data and the ANN estimates increased with an 
increasing number of neurons. For more than 5 neurons, little change in the skill was observed when increasing the number of 
hidden layer neurons (Fig. $3). Thus an ANN with 5 hidden layer neurons represents the simplest ANN that can converge to a 
solution and model the non-linear relationship between the satellite inputs and the surface flux estimates. 

To train the ANN, we used LE, H and GPP estimates from the years 2008-2010. The target dataset was generated through a triple 
collocation based merging of triplets of the flux estimates introduced in Section 2.1 (details are discussed in Section 3.2). After 
completion of the training, the performance of the ANN and its ability to generalize was evaluated using the LE, H and GPP target 
data from 2011. Finally, WECANN retrievals are evaluated against other global products and eddy covariance tower data. Results 


of these comparisons are presented in section 4. 


3.2 Target Dataset: A Bayesian prior using Triple Collocation 


One of the key issues in the design of an ANN to retrieve any geophysical variable is defining a good target dataset. One practice 
has been to use outputs from a land surface model as the target (Aires et al., 2005; Jiménez et al., 2013; Kolassa et al., 2013; 
Rodriguez-Fernandez et al., 2015). However, all observations and models contain random errors and biases. Therefore, the retrieval 
based on the ANN exhibits some of the biases of the original target dataset even if the ANN is able to make corrections to its 
original target data (e.g. correction of an imperfect seasonal cycle, as demonstrated by Jiménez et al., 2009). To address this issue, 
we use three datasets, which are sufficiently independent so that the training can learn from each dataset and benefit from all of 
them, synergistically. We implement a pseudo Bayesian training by probabilistically weighting the occurrence of each training 


dataset by its likelihood, and define a target dataset. The three datasets are listed in Table 1 for each variable. 


10 


15 


20 


25 


30 


35 


To define this prior distribution, we use the triple collocation (TC) technique. TC is a method to estimate the Root Mean Square 
Errors (RMSE) (and, if desired, correlation coefficients) of three spatially and temporally collocated measurements by assuming a 
linear error model between the measurements (McColl et al., 2014; Stoffelen, 1998). This methodology has been widely used in 
error estimation of land and ocean parameters, such as wind speed, sea surface temperature, soil moisture, evaporation, 
precipitation, fAPAR, and in the rescaling of measurement systems to reference system for data assimilation purposes 
(Alemohammad et al., 2015; D’Odorico et al., 2014; Gruber et al., 2016; Hain et al., 2011; Lei et al., 2015; Miralles et al., 2010, 
2011b; Parinussa et al., 2011), as well as in validating categorical variables such as the soil freeze/thaw state (McColl et al., 2016). 


The relationship between each measurement and the true value is assumed to follow a linear model: 
Xi = a; + B;t + Ej i= 1,2,3 (1) 


where X;'s are the measurements from the collocated system i (e.g. remote sensing observation, model output, etc.), t is the true 
value, a; and f; are the intercept and slope of the linear model, respectively. ¢; is the random error in measurement i and TC 
estimates the variance of this random variables in each measurement. By further assuming that the errors from the three 
measurements are uncorrelated (C ov(é;, gj) = 0, fori# j) and the errors are uncorrelated with the truth (Cov(e;,t) = 0), the 


RMSE of each measurement error can be calculated as (McColl et al., 2014): 
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in which Q;; is the (i*", j*") element of the covariance matrix between the three measurements. Since the triplet of datasets used 
for training each of the fluxes (see Table 1) is derived through different semi-empirical approaches with different sources of errors, 
the assumption of uncorrelated errors is more likely to be met. In the following, we will calculate the standard deviation of random 
error component of Equation (1) using TC for each of the surface fluxes, and use them as TC based errors of each product. 

The TC estimated errors for the surface fluxes and GPP are shown in Figs. S4-S6. The white regions represent missing retrievals 
or discarded negative estimates due to insufficient data record. For LE, high TC errors are found in the Amazon rainforest and 
tropical Africa for GLEAM, in Amazon rainforest and the Sahel for ECMWF, in Indian peninsula for FLUXNET-MTE and in U.S. 
Great Plains for ECMWF and FLUXNET-MTE. For H, beside the aforementioned regions, high TC errors are also found in 
Southeast Asia for GLEAM and ECMWF, and in northern Canada for FLUXNET-MTE. For GPP, MODIS and ECMWE have the 
highest errors in Amazon rainforest, ECMWF and FLUXNET-MTE have relatively higher errors in US Great Plains, and all three 
products have similar errors in Tropical Africa. 

There are several likely causes for these errors. For the FLUXNET-MTE data, the regions which are not covered by (many) 
FLUXNET eddy-covariance stations may result in larger uncertainties, and those regions for which interception is a large 
component of the LE flux as well (Michel et al., 2016). For the GLEAM and ECMWF data thick vegetation generally induces 
biases compared to the satellite observations, especially in tropical regions (Anber et al., 2015). 

Finally, we use the TC-based RMSE estimates at each pixel to compute the a priori probability (P;) of selecting a particular dataset 


in each pixel, if that pixel is used as part of the training dataset: 
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in which P; is the probability of selecting dataset i when sampling from three measurements. We assume that these probabilities 
are time independent as we are limited by the currently available duration of the input data; however, future versions will explore 


the use of seasonally varying probabilities. 


4. Results and Discussion 
4.1 Global Magnitude and Variability of LE, H and GPP 


In this section, we present and compare the retrievals of LE, H and GPP for the year 2011, which was not included in the training 
step of WECANN. Thus, it is used here to evaluate the ANN fit to the target values. 

Figure 2 illustrates the annual global average and scatterplots of retrievals vs target estimates. The spatial patterns of the WECANN 
retrievals are similar to expectations. The average global values in 2011 are 38.33 W m” for LE, 39.44 W m? for H, and 2.34 gC 
m” day *! (or 123.16 PgC yr!) for GPP. LE has the best R? (0.95) comparing to H (R?=0.89), and GPP (R?=0.90). The Root Mean 
Squared Difference (RMSD) of each of the retrievals with respect to the target estimates is as following: for LE, RMSD = 11.06 
W m”; for H, RMSD = 13.13 W m”; and for GPP, RMSD=1.22 gC m? day *. 

The seasonal variability and spatial pattern of the retrievals from 2011 are shown in Figs. 3 - 5. LE does not exhibit any variability 
over deserts, such as the Sahara and Arabian Peninsula, as expected (Fig. 3). Wet tropical forests exhibit subtle seasonal variability 
in LE. These spatial variabilities in the seasonal cycle reflect changes in the radiation, temperature, water availability during the 
dry season, soil nutrient, soil type conditions as well as leaf flushing (Anber et al., 2015; Morton et al., 2014, 2016; Restrepo- 
Coupe et al., 2013; da Rocha et al., 2009; Saleska et al., 2016). In contrast, seasonal variability dominated by radiation availability 
are noticeable in wet mid-latitude regions for both Northern and Southern Hemisphere, i.e., East Asia, Eastern U.S. and Australian 
North and East Coast with over 60 W m* difference between winter and summer months. One exceptional case is South Asia, 
where LE does not significantly rise in spring, likely due to the effects of the monsoonal climate. In Eastern South America, the 
ET estimates are relatively high compared to GPP estimates. This difference can be caused by either low water use efficiency or 
significant rain re-evaporation and soil evaporation. Moreover, the SIF relationship with GPP likely changes in C4 plants. However, 
we did not impose the C4/C3 delimitation in the ANN as it would be highly dependent on the quality of the classification map 
used. We note that all training products used here include C3/C4 delimitation and therefore the C3/C4 delimitation is implicit in 
the training dataset; therefore, can be learnt by the network. 

Seasonal variabilities in H (Fig. 4) are distributed in opposite pattern to LE, as expected. Deserts and dry regions i.e., the Sahara, 
Southwestern U.S. and Western Australia demonstrate much more seasonal variability than the rest of the world -given the strong 
water limitations there, the available energy converted into H becomes dictated by the seasonal cycle of solar radiation. In contrast, 
tropical rainforests (Amazon, Congo, Indonesia) exhibit limited seasonal variability. In mid-latitude energy-limited regions 
(Central/Eastern Europe, Easter US), H also reflects the course of available energy, and in more water-limited regimes (e.g. 
Western US and Mediterranean Europe), it reflects the interplay between soil dryness and available energy, with a peak between 
spring and summer for dry regions. 

The seasonal variability of GPP (Fig. 5) in Northern latitudes follows the availability of radiation in wet regions with a peak in 
summer and another in spring for dry regions, corresponding to both soil water availability and high incoming radiation. A clear 
East-West transition conditioned by water availability is observed in continental U.S. In tropics and subtropics, the response is 


diverse. The Amazon rainforest exhibits high GPP throughout the year with a peak between September and February in the wetter 
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part of the basin, following the dry season, consistent with the observations at eddy-covariance towers near Manaus and Santarem 
(Restrepo-Coupe et al., 2013; da Rocha et al., 2009). Compared to LE, substantial geographical variability is observed in the 
Amazon, because of the strong variabilities in soil type, green up, biodiversity and rooting depth. In the drier part of the basin, 
water availability controls the seasonal cycle of photosynthesis and the peak in GPP is observed in the wet season (DJFMA). In 
the Congo rainforest, GPP exhibits four seasons, with two wet and two dry ones, with substantial decrease in GPP during those 
dry spells. In Indonesia, GPP is steadier throughout the year, exhibiting high values year-round. Monsoonal climates over India, 
South-East Asia, Northern Australia and Central-Northern America are well captured with rapid rise in GPP following water 
availability. The highest GPP are observed in rainforests and the US agricultural Great Plains, in JJA for the latter. Northern latitude 


regions mainly exhibit substantial GPP in the summer and late spring, and small values throughout the rest of the year. 


4.2 Evaluation with FLUXNET Data 


Direct validation of the WECANN retrievals is challenging by the fact that no global, error-free estimates of LE, H and GPP are 
available. Remote sensing or model products such as those used for training have their own errors. When three datasets with 
uncorrelated errors (commonly assumed to be true if the sources of error in each dataset have no common physical origin) are 
available, triple collocation provides a valuable technique to evaluate large-scale datasets in the absence of a known truth. However, 
WECANN’s use of different training datasets will cause the presence of some correlated errors between WECANN retrievals and 
any of the datasets used for the training. Instead, we evaluate the retrievals by comparing them to data from a set of FLUXNET 
eddy-covariance towers. WECANN uses three training datasets which one of them (FLUXNET-MTE) is based on upscaling 
FLUXNET eddy-covariance tower estimates. This might cause some dependence between WECANN retrievals and the tower 
estimates. However, WECANN learns from all three training datasets collectively, and uses remote sensing observations as input. 
Therefore, this dependence is negligible. In situ estimates from eddy covariance towers with a footprint of a few 100 m to km may 
not be representative of the entire 1° x 1° pixel, and are known to have problems with energy closure (Foken et al., 2010). However, 
in the comparison against tower data the impact of large-scale climate variability and seasonality can still be seen even at different 
spatial scales. For instance, the phenology has a strong impact on the seasonal cycle of the LE, H and GPP and in the following 
examples, it is clearly highlighted when comparing different products to flux tower estimates. 

Summary of statistics across 85 FLUXNET sites are provided in Tables $1 — S3. Overall, WECANN performs better than other 
alternative global products. In particular, WECANN has the highest correlation for 76% of sites for LE, 54% of sites for H, and 
53% of sites for GPP. This high R? reflects the capacity of WECANN to correctly capture the seasonal cycle and interannual 
variability, as it is largely imposed by the remote sensing observations rather than by the statistical retrieval (Jiménez et al., 2009). 
One of the reasons for this is the presence of the SIF information in the ANN retrieval, which is directly related to GPP and plant 
transpiration (Frankenberg ef al., 2011). The RMSE of WECANN is lower than all other products at 71% of sites for LE, 46% of 
sites for H, and 51% of the sites for GPP. The bias is also reduced compared to other retrievals, even if some variability can be 
seen from site to site. 

Figure 6 shows a summary of the correlation coefficients presented in Tables S1 - S3 for each group of Plant Functional Types 
(PFTs). Each class has between 6 and 22 sites. WECANN has the best mean within each PFT class, and the smallest variability in 
most of the classes for all three variables. 

Figure 7 shows the comparison of monthly WECANN retrievals and three other global products’ estimates with the tower estimates 
across 5 select sites that span a range of climatic and vegetation coverage conditions. At the Oklahoma agricultural site (US-ARM), 


H and LE are well reproduced, yet dry year H is underestimated (Fig. Figure 7a). The GPP reported at the site very rapidly decays 
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at the end of the spring whereas the region is highly agricultural with sustained agriculture in the summer. The difference between 
the reported GPP and WECANN retrievals might be again due to the difference in the footprint of the two estimates. 

At the Brasschaat site, BE-Bra, Belgium (Fig. Figure 7b), LE is very well captured by WECANN, which captures the seasonal 
cycle well, yet misses some of the interannual variability. WECANN outperforms the other retrievals of LE and GPP and captures 
the GPP seasonal cycle very well compared to other products, which display too early GPP rise and overestimate summer GPP. 
Again, the SIF data provides independent useful data compared to other environmental information (radiation, temperature, 
vegetation indices) used by the other retrieval schemes. All retrievals strongly underestimate the reported eddy-covariance H. At 
this humid site though, the magnitude of the measured H is often higher or on the same order in the summer as LE. Given the high 
degree of urbanization around the site, it is most likely a reflection of the footprint of the eddy-covariance and the fact that it 
observes urbanized surfaces with high H. Indeed, the surface energy budget is not locally balanced and turbulent fluxes are higher 
than the observed net radiation minus ground heat flux. 

At the cold Finland site (FI-Hyy), WECANN very well captures the seasonal cycle of GPP and LE, as well as to a less extent H. 
WECANN better reproduces the seasonality, amplitude and interannual variability compared to other retrievals (Fig. Figure 7c). 
It also reflects the difficulties of retrieving fluxes in snow dominated regions. SIF has the great advantage that it is not directly 
sensitive to snow compared to vegetation indices for instance, which incorrectly attribute snowmelt and changes in observed 
ground color to photosynthesis onset (Jeong et al., 2017). 

At the monsoonal grassland site of Santa Rita, AZ, WECANN correctly captures the complex dynamics of H and LE at the site 
with some rain periods preceding the Monsoon period (Fig. Figure 7d). Yet, WECANN slightly underestimates LE and 
overestimates GPP. In fact, all products overestimate GPP in the dry and cold seasons. The landscape in the region is highly 
heterogeneous with denser vegetation in riparian zones, away from the tower location, which may explain the lower GPP value at 
the site compared to estimates of the larger-scale values. 

Finally, at the South African Mediterranean site, ZA-Kru, WECANN reproduces some of the dynamics of the observed H, yet is 
typically smoother (Fig. Figure 7e). It reasonably captures the LE dynamics, except for the suspect cold season increase reported 
at the tower in 2013 (like other products). All products overestimate the reported GPP, though WECANN is closest to the 
observations and better captures the seasonal dynamics compared to other products. 

Overall, across the different sites, the WECANN retrieval performs better than other products, especially in terms of the seasonality 
of LE, H and GPP. Several factors contribute to the improved retrieval of WECANN compared to other products, even at those 
smaller footprint sites. First, the SIF measurements that are directly correlated with GPP provide a better constraint on estimating 
LE, H and GPP. The ANN approach in WECANN also uses a novel training technique based on probabilistically merging different 
datasets to remove outliers from its target dataset. Therefore, WECANN retrievals learn collectively from the different datasets 


(and remote sensing observations) and are closer to the truth than each of the individual target datasets. 


4.3 Comparison against other remote-sensing based products 


In this section, we compare the WECANN-based estimates to other datasets used in the training to better understand how 
WECANN differs from those training data. Figure 8 shows the comparisons for LE, and indicates that our product has a relatively 
similar R? with the three products (R* = 0.96 with FLUXNET-MTE and ECMWF, and R? = 0.94 with GLEAM). However, the 
scatterplot with FLUXNET-MTE is more concentrated and aligned along the 1:1 line, further emphasizing the consistency between 
the two datasets (RMSD of 6.42 W m? for FLUXNET MTE versus 8.47 W m? and 9.72 W m? for GLEAM and ECMWF, 
respectively). Differences in spatial patterns shown in Fig. 8a-c reflect that WECANN exhibits smaller spatial differences with 
FLUXNET-MTE than GLEAM or ECMWE and such differences exhibit a narrower range between -10 and 10 W m’. FLUXNET- 
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MTE overestimates LE compared to WECANN in transitional tropical and subtropical regions and particularly over India, which 
are regions with few eddy-covariance towers. GLEAM exhibits substantial differences with our product particularly in regions 
dominated by seasonal water stress such as Brazilian savannas, the Horn of Africa, Central America, India and the subtropical 
humid part of Africa south of the Congo. In the Sahel, GLEAM LE is higher than our estimate and FLUXNET-MTE. The LE 
estimate of ECMWF is nearly always higher than our estimate with much higher values in the Congo, the Amazon, Southern Brazil, 
and Northern Canada. In Europe, where the ECMWF estimate should be best because of the frequent weather operational forecast 
checks and model adjustment in the region, the estimates are more similar. The differences and similarities of WECANN retrievals 
with the three target datasets is consistent with the error estimates from TC. For example, Fig. S4 shows that FLUXNET-MTE has 
the smallest error in LE estimates globally compared to GLEAM and ECMWF, other than across India. WECANN retrievals also 
have better agreement with FLUXNEWT-MTE. 

The differences in H estimates are more complex (Fig. 9). First, the R* between WECANN and the other datasets are slightly lower 
than for LE. ECMWF and FLUXNET-MTE yield higher R? with WECANN (0.92) while GLEAM has an R? of 0.87. GLEAM 
exhibits lower H in most of the Northern hemisphere, especially in seasonally dry regions, potentially due to its simple formulation 
of ground heat flux (G). H estimates are relatively higher over the Amazon and Congo but lower over Indonesia for GLEAM. In 
the Southern Sahara and northern Sahel as well as in Eastern Asia and Canada GLEAM has lower H compared to WECANN and 
FLUXNET-MTE. ECMWE exhibits higher values in seasonal dry regions such as Western US, Brazilian Savannas, Southern 
Congo, the Sahel compared to WECANN and smaller values in the Amazon, Indonesia, and over desert areas of the Sahara and 
Arabic peninsula as well as South East Asia. The GLEAM and ECMWFE H difference maps show many similar patterns: the Sahara, 
Eastern Europe, East Asia are underestimated, while Southern Africa and Eastern part of Amazon are overestimated. Similarly the 
errors patterns estimated from TC (Fig. $5) are consistent with the comparison of WECANN and target datasets. Figure S5 shows 
that ECMWF has higher errors in the Sahel, Southern Congo, and Brazilian Savana and GLEAM has higher errors in the Amazon, 
East Asia and Central Africa. 

The comparison between the GPP estimates shows significant differences (Fig. 10). WECANN compares the best against 
FLUXNET-MTE (R? = 0.93), with MODIS (R? = 0.91) and ECMWE (R? = 0.90) following. While all three products have similar 
R’, their spatial differences are distinct. In the Amazon, ECMWF and FLUXNET-MTE have larger GPP estimates compared to 
WECANN, while MODIS estimates are much smaller. In cold northern latitude regions of Siberia and Northern Canada, all three 
products have higher GPP than WECANN. In Congo, MODIS and FLUXNET-MTE have higher GPP, while ECMWFE has a lower 
one. In Central and Southwestern US, all three products tend to yield lower GPP. Comparison of these findings with the error 
estimates from TC (Fig. S6) shows that FLUXNET-MTE has the lowest errors globally, while ECMWE has the largest errors in 
the Amazon. 

Finally, we compare annual anomalies of WECANN retrievals and the three training datasets globally and in different climatic 
zones. The zones are defined as Polar (90° N - 60° N), Northern Hemisphere (NH) mid-latitude (60° N - 10° N), Tropics (10° N - 
15° S), and Southern Hemisphere (SH) mid-latitude (15° S - 60° S). Results are presented in Figure S7. WECANN anomalies are 
derived from the mean values between 2007 and 2015. However, not all the training products are available for this period. Therefore, 
their anomalies are calculated from their respective temporal domain which is 2007-2011 for FLUXNET-MTE, GLEAM and 
MODIS, and 2008-2011 for ECMWF. Anomalies from the four products have similar patterns in general, while their absolute 
values differ. Anomalies in GPP have better agreement across different products compared to LE and H. Evaluation of the 


discrepancies between these anomalies are beyond the scope of this paper and will be characterized in detail in a future study. 
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4.4 Extreme Events Assessment 


In order to further assess WECANN at regional scales, we analyse its capacity in capturing extreme events. We thus selected three 
major heatwave and drought events that occurred during the temporal coverage of WECANN product. These events are the Russia 
2010 heat wave, Texas 2011 drought and US Corn Belt 2012 drought. Figure 11 shows the percentage of average monthly 
anomalies with respect to mean values, for LE, H and GPP, in each of the three cases. The patterns reveal significant anomalies in 
all fluxes which is consistent with reported patterns. In summer 2010, a historical heatwave occurred over western Russia and 
resulted in all-time maximum temperature record in many locations (Dole et al., 2011). The extent of reduction in LE and increase 
in H derived from WECANN retrievals is consistent with estimates reported in the literature (Lau and Kim, 2012), with 10-15% 
increase in H and 15-20% reduction in LE. In early 2011, drought conditions developed in southern US, particularly in the states 
of Texas and Louisiana (Luo and Zhang, 2012). By April, most of Texas, Oklahoma, Louisiana and Arkansas were classified in 
the D4 drought condition (exceptional drought), and the situation continued throughout the summer and fall of 2011 as reported 
by US Drought Monitor (Svoboda et al., 2002). As Figure 11 reveals, the same spatial pattern is pronounced in the monthly 
anomalies derived from WECANN retrievals, emphasizing massive reduction in LE and GPP accompanied by high H over the 
region. 

Finally, an intense drought in the central US, particularly in the Corn Belt, occurred in 2012 and reduced maize yields by about 
25% and increased prices by 17—24% (Boyer et al., 2013; USDA, 2013). By mid-September 2012 almost two-thirds of the 
continental US was covered by drought, and different parts of US Corn Belt was categorized as either D3 (extreme drought) or D4 
(exceptional drought) condition. Figure 11 shows similar patterns in LE, H and GPP with significant positive anomaly in H (~20%) 


and reductions in LE and GPP (~20%), consistent with crop yield decrease. 


4.5 Basin Scale Evaluation 


We also assess the accuracy of WECANN ET retrievals using ET estimates from the independent water budget closure model 
introduced in Section 2.3.2. The analysis is carried out for years 2007 to 2010 that overlaps between WECANN retrievals and the 
water budget closure study. Figure 12 shows the relative absolute difference in ET estimates from WECANN compared to the ET 
estimates from the water budget study for each of the five basins (mean value and one standard deviation across 48 months). Mean 
absolute differences vary between a low of 5% in the Amazon and a larger 24% value in Colorado, while other three basins have 
a mean difference of 9%, 17%, and 20%. While the differences vary between a low and moderate range, it should be noted that the 
coarse spatial resolution of WECANN product causes a difference in the spatial averaging to get the basin level estimates of ET. 
Moreover, in the budget closure estimates only a single runoff (at the outlet of the considered basin) is used over the entire basin; 
therefore, large heterogeneous basins such as the Colorado and Mississippi have large uncertainties associated with them, as runoff 
does not correctly constrain the flux distribution over the entire basin. It is over those basins that the WECANN retrieval compare 
less favorably with these large-scale estimates. Downscaled version of those estimates would further help in the evaluation of ET 


products. 


4.6 Uncertainty Analysis of WECANN Retrievals 


One of the advantages of a statistical retrieval algorithm, in particular of ANNs, is that the run time is extremely fast, after the 
training step. This enables us to characterize the uncertainty of the retrievals by propagating the uncertainties in the input variables 
through the network. For this purpose, we set up a 10,000 bootstrap experiment and run the WECANN retrieval by adding error 
to input variables. The errors are normally distributed with mean zero and a standard deviation that depends on the input variable. 


For SIF, air temperature and soil moisture, we use the error estimates or standard deviations reported in their associated products. 
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These errors are spatially and temporally varying and we used the associated value for each time and space data point. For net 
radiation, we use a constant standard deviation of 34.58 W m7 based on the analysis by (Pan et al., 2015). For precipitation and 
SWE estimates, we use a conservative 10% of the estimates themselves as a standard deviation for error. For each bootstrap 
replicate, we sample from the error distribution of each input variable and add that to the input. 

Figure 13 shows the results of the bootstrap for each of the LE, H and GPP globally and in different climatic zones (defined in 
Section 4.3). Each panel in Figure 13 shows the uncertainty derived from the bootstrap experiment, relative to the interannual 
variability of the retrievals. GPP estimates are provided in units of PgC yr! as total productivity in each region. LE and H are 
provided in units of W m” as an average rate of flux in each region. 

At global scale the GPP ranges between a minimum of 117.15 + 2.379 PgC yr! in 2015 to a maximum of 124.82 + 2.482 PgC yr 
‘in 2007. Similarly, LE has a minimum of 37.40 + 0.54 W m” in 2015 and a maximum of 38.33 + 0.53 W m7? in 2011. H has a 
maximum of 41.00 + 0.54 W m” in 2015 and a minimum of 39.43 + 0.52 W m? in 2011. 

The inter-annual variations of surface fluxes and GPP show distinct patterns. For example, in year 2015, which was an El Nifio 
year, LE and GPP have reduced notably, and H increased to an extreme value in the 9 years of WECANN product. Moreover, from 
2011 to 2015 both LE and GPP have a consistent decreasing trend at global scale. The inter-annual variability of GPP and LE are 
similar at global scale while their regional patterns are different. For example, in year 2015 GPP at global scale and in all regions 
has decreased with respect to 2014, while LE in Polar and NH mid-latitudes have increased and LE at global scale has decreased. 
As expected, the variability of LE and H are anti-correlated. We note that while WECANN is trained on three independent estimates 


of LE, H and GPP, its inter-annual variability is driven by remote sensing observations that are input to the ANN. 


4.7 Impact of SIF on the retrieval of surface fluxes and GPP 


Satellite SIF observations are relatively new, and have not been used to estimate LE and H at the global scale previously. Therefore, 
we want to assess the information content of SIF observations in the WECANN retrievals by replacing them with more typical 
optical/near-infrared indices of vegetation (NDVI or EVI). 

To do so, we trained two different ANNs with NDVI and EVI instead of SIF data on each of the three variables (LE, H and GPP) 
and evaluated the retrievals against the same FLUXNET tower measurements used in Section 4.2 for evaluating WECANN 
retrievals. Tables S4 - S6 show the results of evaluations of these three retrievals against the tower measurements for LE, H and 
GPP, respectively. In terms of correlation coefficient, on average all three retrievals have relatively similar performance except in 
regions where phenology (and incident radiation) is not the main contributor to the flux variability such as in Spain (ES-LgS). 
Indeed, in such regions changes in canopy structure is more limited and changes in response to water stress (through changes in 
light and water use efficiency) are the primary reason for the seasonal variability. This emphasizes, similarly to current thinking 
on the SIF signal, that the monthly SIF signal is dominated by incident radiation and canopy structure but that in some conditions 
light use efficiency changes are detected by SIF, but not optical vegetation indices (Lee et al., 2013). We also point out that current 
SIF retrievals (such as those from GOME-2 used here) are still noisy as they were not obtained by satellites designed to measure 
SIF. Future SIF designated missions such as Fluorescence Explorer (FLEX) will have higher accuracy and finer spatial and 
temporal resolution (Drusch et al., 2016). We expect they will further enhance the retrievals of surface fluxes and GPP such as 


those from WECANN. 
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5 Conclusion 


This study introduces a new statistical approach to retrieve global surface latent and sensible heat fluxes as well as gross primary 
productivity using remotely sensed observations at a monthly time scale. The methodology is developed based on an Artificial 
Neural Network (ANN) that uses six input datasets including solar induced fluorescence (SIF), precipitation, net radiation, soil 
moisture, snow water equivalent, and air temperature. Moreover a Bayesian approach is implemented to optimally integrate 
information from three target datasets for training the ANN using Triple Collocation to calculate a priori probabilities for each of 
the three target datasets based on their uncertainty estimates. 

The new global product, referred to as WECANN, is evaluated using target datasets as well as FLUXNET tower observations. The 
evaluation results comparing with training datasets show that our retrieval has similar correlation with the three products while it 
has the smallest RMSD with FLUXNET-MTE for LE (RMSD=6.42 W m7’), H (RMSD=7.84 W m7”) and GPP (RMSD=0.88 gC 
m” day"), which is believed to be one of the most realistic global datasets and it has the lowest RMSE based on our TC error 
estimates (Fig. S4 — Fig. S6), despite its reported underestimated inter-annual variability due to the use of climatological values 
for several meteorological drivers (Miralles et al., 2014a, 2016). Such tendency also can be summarized from the global difference 
maps, which show that FLUXNET-MTE has the best agreement with WECANN retrievals. The WECANN and FLUXNET-MTE 
approaches are both based on machine learning, although the FLUXNET-MTE retrievals use a regression tree rather than an ANN. 
Nevertheless, this commonality of methods may also contribute to the greater correspondence between these two datasets. 

The retrieval maps indicate that LE, H and GPP have similar seasonal variability and distribution which are determined by annual 
phenological cycle in energy limited Northern latitude regions, dryness in Mediterranean and Monsoonal climates and by light 
availability in rainforests. Seasonal radiation has great impact on some regions for all three variables, such as Eastern U.S., Europe 
and East Asia, which have wet conditions, are highly vegetated and located in mid-latitudes. As opposed to this, the seasonal 
variability of LE, H and GPP in some low-latitude and wet condition regions, such as Amazon rainforest, Southern Africa and 
Southeast Asia, as well as some low-latitude arid regions, such as Southwest U.S., Western Australia, North Africa and Western 
Asia are not significant, as there is less seasonal solar radiation variability in aforementioned regions. Comparison between the LE, 
H, and GPP, they all demonstrate generally similar patterns of seasonal variability through time. 

We also assessed the impact of SIF on retrieval quality. In comparison to optical-based vegetation indices, SIF has better 
performance in regions where phenology and incident radiation are not the main contributor to flux variability, while it has similar 
performance in other regions. 

From the evaluation results comparing with FLUXNET tower observations, it is noted that WECANN has better performance 
compared to other global products. LE and H estimates from WECANN are more consistent with tower observations compared to 
GPP. WECANN retrievals have better correlation with tower observations in 76% of site for LE, 54% of sites for H, and 53% of 
sites for GPP compared to other products. Moreover, retrievals from WECANN outperform other global products in capturing the 
seasonality of surface fluxes and GPP across a wide range of sites with different climatic and biome conditions. 

We also assessed the performance of WECANN in capturing extreme heatwave and drought events, and showed that in the case 
of Russia 2010 heatwave, Texas 2011 drought and US Corn Belt 2012 drought WECANN properly captures the extent of the 
anomalies in LE, H and GPP. Moreover, an independent ET estimate from a water budget closure model was used to evaluate 


WECANN ET estimates across five large basins, and it showed small to moderate errors for WECANN retrievals. 
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Data Availability 


WECANN product is publicly available for download on Aura Validation Data Center (AVDC) at Goddard Space Flight Center 
via https://avdc.gsfc.nasa.gov/pub/data/project!.WECANN/ 
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Input Hidden Output 


O = f,(£,(WP + b)) 


Figure 1: Architecture of the ANN layers. Input layer provides the matrix P of the inputs to the Hidden layer. Hidden layer has a matrix 
W of weights and b of biases for the neurons, and the fi transfer function. The output of the Hidden layer (a = f1(WP +b) ) is an input to 
5 the Output layer that applies the transfer function f2 to the estimates and generates final outputs O. 
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Figure 2: Left column: Annual average retrievals in 2011 for (a) LE, (b) H, and (c) GPP. Right column: Density scatterplot between 


estimates of ANN and target data for (d) LE, (e) H, and (f) GPP during the validation period (2011). The density of scatter points is 
represented by the shading color. The diagonal black line depicts the 1:1 relationship. 
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Figure 3: Global patterns of seasonal average LE from WECANN in 2011, (a) December - February, (b) March - May, (c) June - August, 
and (d) September - November. 
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5 Figure 4: Similar to Figure 3 but for H instead of LE 
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Figure 6: Correlation coefficient (R?) between WECANN retrievals and FLUXNET tower estimates categorized across different plant 
functional types for (a) LE, (b) H, and (c) GPP. Markers show mean, and whiskers show one standard deviation intervals. 
(CRO=Croplands, DBF=Deciduous Broadleaf Forests, EBF=Evergreen Broadleaf Forests, ENF=Evergreen Needleleaf Forests, 


GRA=Grasslands, MF=Mixed Forests, SA V=Savannas, and WET=Permanent Wetlands) 
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Figure 7: Comparison of the retrievals with eddy covariance observations of LE, H and GPP across 5 sites (a) US-ARM site, USA, (b) 
AT-Neu site, Austria, (c) BE-Bra site, Belgium, (d) FI-Hyy site, Finland, (e) US-SRG, USA, and (f) ZA-Kru, South Africa 
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Figure 8: Difference between annual mean LE retrieved by WECANN and the three target datasets (a-c). Scatter plots of LE retrieved 
from WECANN vs. from each of the target datasets (d-f). Data used are from 2011. 
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Figure 9: Similar to Figure 8 but for H instead of LE 
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Figure 10: Similar to Figure 8 but for GPP instead of LE 
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Figure 11: Mean monthly anomalies (in percentage with respect to mean value) for three extreme heatwave events. 


31 


Relative Absolute Difference between ETwecann and ETmode; (2007-2010) 


cy 40 4 
vo 
2 304 
4 
£ 20 
fa) 
£101 
si ¢ 
Z 04 
' ' - 1 : 
Amazon Colorado Congo Mississipi Orinoco 


Basin Name 


Figure 12: Relative absolute difference between ET estimates of WECANN compared to modeled ET from basin scale water budget 
closure. Markers show mean, and whiskers show one standard deviation intervals. 
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Figure 13: Annual mean estimates and uncertainty bounds of LE (top row), H (middle row) and GPP (bottom row) retrievals at global (left column) and regional (four right columns) 
scales between 2007 and 2015. The central line in each box indicates the mean, the edges of the box are 25th and 75th percentiles, and the whiskers show the most extreme values. 
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Table 1: Characteristics of products used for training of ANN 


Product Output variables Temporal Spatial Temporal Spatial Berens 
used for training Coverage Coverage Resolution Resolution 
GLEAM LE, H 1980 - 2015 Global Daily 0.25° x 0.25° eae x 
ECMWE ERA ‘ é % Balsamo et 
HTESSEL LE, H, GPP 2008 - 2015 Global Daily 0.25° x 0.25 al., 2009 
5 4 Jung et al., 
FLUXNET-MTE | LE, H, GPP 1982 - 2012 Global Monthly 0.5° x 0.5 2009 
MODIS-GPP GPP 2000 - 2015 Global Monthly 0.5° x 0.5° ae ss 
Table 2: Characteristics of observations used as input in the WECANN product 
Waaenis Product N ame and Temporal Spatial Temporal Spatial Repszente 
Version Coverage Coverage Resolution Resolution 
GOME-2 : 3 5 Joiner et al., 
SIF hice nen eee 2007-present Global Daily 0.5° x 0.5 013 
Net Radiation | CERES L3 SYN Ideg | 2002-present Global Monthly 1°x 1° eee ME 
Air : ree Aumann et 
Temperature AIRS3STD v6.0 2002-present Global Daily 1°xl al., 2003 
Soil Moisture | ESA-CCI v2.3 1978-2015 Global Daily 0.25° x 0.25° aa 5 al; 
Precipitation | GPCP 1DD v1.2 1996-2015 Global Daily 1°x 1° eee se 
Snow Water | GLOBSNOW L3A v2. | 1979-present Global Daily 25 km x 25 km | DUojus etal. 
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